Take Home Exercise 1

Visual Analytics of the demographic and financial characteristics of residents in City of Engagement

Author

Oh Jia Wen

Published

May 6, 2023

Modified

May 6, 2023

1. OVERVIEW

City of Engagement is a small city located at Country of Nowhere, with a total population of 50,000, serving as an agriculture region. The local council of the city is in the midst of preparing the Local Plan 2023.

1.1 The Task

In this take-home exercise, you are required to apply the concepts and methods you had learned in Lesson 1-4 to reveal the demographic and financial characteristics of the city of Engagement, using appropriate static and interactive statistical graphics methods.

2. Datasets

Data has been collected by the local council of the city. The survey sampled 1,000 respondents to collect data related to their household demographic, spending patterns and among, other things. The data is stored in two separate files ranging from Mar 2022 to Feb 2023 :

1) Participants.csv 2) FinancialJourval.csv
rows 1,011 1,513,636
variables 7 4

2.1 Metadata

File Columns Description
participants.csv

3. Data Preparation

3.1 Install R-packages

Using p_load() of pacman package to load and install the following libraries:

  • ggiraph : For creating interactive ‘ggplot’ graphics

  • plotly : For creating interactive statistical graphs

  • patchwork : For combining multiple ggplot2 graphs into one figure

  • tidyverse : A collection of R packages use in everyday data analyses. It is able to support data science, data wrangling, and analysis.

  • knitr: For dynamic report generation

  • ggstatsplot: For creating graphics with details from statistical tests included and its plot

  • paletteer: Collection of color palettes

  • wesanderson: Wes Anderson’s theme Palette Generator

#to double check if got use DT, scales, ggpubr
pacman::p_load(ggiraph, plotly, patchwork, DT, tidyverse,
               knitr,scales,ggstatsplot,paletteer,wesanderson,
               ggpubr,crosstalk,gganimate)
options(scipen = 999)
Tip

options(scipen = 999) : The above code removes scientific notation in our exercise.

3.2 Import Data

3.2.1 Import participants dataset

participants <- read_csv("data/Participants.csv")

3.2.2 Load participants

# A tibble: 6 × 7
  participantId householdSize haveKids   age educationLevel      interestGroup
          <dbl>         <dbl> <lgl>    <dbl> <chr>               <chr>        
1             0             3 TRUE        36 HighSchoolOrCollege H            
2             1             3 TRUE        25 HighSchoolOrCollege B            
3             2             3 TRUE        35 HighSchoolOrCollege A            
4             3             3 TRUE        21 HighSchoolOrCollege I            
5             4             3 TRUE        43 Bachelors           H            
6             5             3 TRUE        32 HighSchoolOrCollege D            
# ℹ 1 more variable: joviality <dbl>
head(participants)

3.2.3 Import Financial Journal dataset

financial_journal <- read_csv("data/FinancialJournal.csv")

3.2.4 Load Financial Journal

# A tibble: 6 × 4
  participantId timestamp           category  amount
          <dbl> <dttm>              <chr>      <dbl>
1             0 2022-03-01 00:00:00 Wage      2473. 
2             0 2022-03-01 00:00:00 Shelter   -555. 
3             0 2022-03-01 00:00:00 Education  -38.0
4             1 2022-03-01 00:00:00 Wage      2047. 
5             1 2022-03-01 00:00:00 Shelter   -555. 
6             1 2022-03-01 00:00:00 Education  -38.0
head(financial_journal)

3.3 Data Wrangling

As seen from the two data tables above, there are several quality issues. 1) for Participants.csv

Inaccurate data types. - participantID is in dbl instead of chr -timestamp format is in POSIX instead of chr - householdSize is in dbl. We will convert it to ord. -reformat age group after looking at the min and max - round up jovality to 2 d.p.

Code with mutate from dplyr to reformat participantID from dbl to chr. Check for duplicates(remove 1,113 rows)

groupby participantID remove duplicates reformat timestamp to year and month create new variables for income, expenses, cashflow

The function distinct() [dplyr package] can be used to keep only unique/distinct rows from a data frame.

unique(financial_journal$category)
[1] "Wage"           "Shelter"        "Education"      "RentAdjustment"
[5] "Food"           "Recreation"    
Show the code
#create new dataset 
participants_new <- participants %>%
    mutate(
          participantId = as.character(participantId),
          #binned joviality to 5-class variables 
          joviality_bins = cut(joviality, breaks = c(0.0,0.2,0.4,0.6,0.8,1.0))
          )

#reformat householdSize to Ordinal 
    participants_new$householdSize <- factor(participants$householdSize,
                                      levels = c("1", "2", "3"), 
                                      ordered = TRUE) 
#reformat age group 
participants_new$age_group <- factor(ifelse(participants$age < 20, "Under 20",
                ifelse(participants$age < 30, "20-29",
                    ifelse(participants$age < 40, "30-39",
                      ifelse(participants$age < 50, "40-49", "Above 50")))),
                levels = c("Under 20", "20-29", "30-39", "40-49", "Above 50"),
                ordered= TRUE)

#reformat education level to Ordinal 
participants_new$educationLevel <- factor(participants$educationLevel, 
                                      levels = c("Low", "HighSchoolOrCollege", 
                                                 "Bachelors", "Graduate"
                                                 ), 
                                      ordered = TRUE)  

#round up joviality to 2 decimal places 
participants_new$joviality <- round(participants$joviality, 2) 
  
participants_new
# A tibble: 1,011 × 9
   participantId householdSize haveKids   age educationLevel      interestGroup
   <chr>         <ord>         <lgl>    <dbl> <ord>               <chr>        
 1 0             3             TRUE        36 HighSchoolOrCollege H            
 2 1             3             TRUE        25 HighSchoolOrCollege B            
 3 2             3             TRUE        35 HighSchoolOrCollege A            
 4 3             3             TRUE        21 HighSchoolOrCollege I            
 5 4             3             TRUE        43 Bachelors           H            
 6 5             3             TRUE        32 HighSchoolOrCollege D            
 7 6             3             TRUE        26 HighSchoolOrCollege I            
 8 7             3             TRUE        27 Bachelors           A            
 9 8             3             TRUE        20 Bachelors           G            
10 9             3             TRUE        35 Bachelors           D            
# ℹ 1,001 more rows
# ℹ 3 more variables: joviality <dbl>, joviality_bins <fct>, age_group <ord>
#check min and max age of residents in COE. 
min(participants$age)
[1] 18
max(participants$age)
[1] 60
#remove duplicate rows for all columns
financial_journal_lessdup <- financial_journal %>% 
  distinct()

You can use group_by() function along with the summarise() from dplyr package to find the group by sum in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df results to get the group by sum.

scales package (part of the Tidyverse) does exactly this:

Show the code
#create new dataset 
grouped_data <- financial_journal_lessdup %>%
  
#recode ID from dbl to chr, year_mth
    mutate(participantId = as.character(participantId),
         year_mth = format(as.Date(financial_journal_lessdup$timestamp), "%Y-%m"),
         amount = abs(round(amount,2)),
         .before = 3) %>%
  
#group the columns in the following order 
  group_by(participantId,year_mth, category) %>%
  summarize(total_amount = sum(amount)) 

# Pivot the data frame to have categories as columns
pivoted_fj <- grouped_data %>%
  pivot_wider(names_from = "category", values_from = "total_amount", values_fill = 0)

# Add a new column with mixed categories
pivoted_fj$Expenses <- pivoted_fj$Education + pivoted_fj$Food + pivoted_fj$Recreation + pivoted_fj$Shelter +pivoted_fj$RentAdjustment
pivoted_fj$Income <- pivoted_fj$Wage
pivoted_fj$Cashflow <- pivoted_fj$Income - pivoted_fj$Expenses
pivoted_fj$Shelter <- pivoted_fj$Shelter + pivoted_fj$RentAdjustment

# Output the pivoted data frame
pivoted_fj
# A tibble: 10,691 × 11
# Groups:   participantId, year_mth [10,691]
   participantId year_mth Education  Food Recreation Shelter   Wage
   <chr>         <chr>        <dbl> <dbl>      <dbl>   <dbl>  <dbl>
 1 0             2022-03       38.0  268.      349.     555. 11932.
 2 0             2022-04       38.0  266.      219.     555.  8637.
 3 0             2022-05       38.0  265.      383.     555.  9048.
 4 0             2022-06       38.0  257.      466.     555.  9048.
 5 0             2022-07       38.0  270.     1069.     555.  8637.
 6 0             2022-08       38.0  262.      314.     555.  9459.
 7 0             2022-09       38.0  256.      295.     555.  9048.
 8 0             2022-10       38.0  267.       25.0    555.  8637.
 9 0             2022-11       38.0  261       377.     555.  9048.
10 0             2022-12       38.0  266.      357.     555.  9048.
# ℹ 10,681 more rows
# ℹ 4 more variables: RentAdjustment <dbl>, Expenses <dbl>, Income <dbl>,
#   Cashflow <dbl>

The function distinct() [dplyr package] can be used to keep only unique/distinct rows from a data frame. If there are duplicate rows, only the first row is preserved.

colSums(pivoted_fj[-1] !=0)
      year_mth      Education           Food     Recreation        Shelter 
         10691           3018          10691           9492          10560 
          Wage RentAdjustment       Expenses         Income       Cashflow 
         10691             72          10691          10691          10691 

Check for missing values

#Check for missing values
any(is.na(participants_new))
[1] FALSE
any(is.na(pivoted_fj))
[1] FALSE

Merge Data Table

Show the code
#join both data sets 
resident_profile <- full_join(participants_new, pivoted_fj, 
                       by = c("participantId" = "participantId")) %>%
#relocate columns to the front (by importance)
                    relocate(year_mth, .after =participantId) %>%
                    relocate(Cashflow, .after = year_mth) %>%
                    relocate(age_group, .after = Cashflow) %>%
                    relocate(educationLevel, .after = age_group) %>%
                    relocate(Income, .after = haveKids) %>%
                    relocate(Expenses , .after = Income) 
resident_profile %>%
    select(c(1:18))
# A tibble: 10,691 × 18
   participantId year_mth Cashflow age_group educationLevel      householdSize
   <chr>         <chr>       <dbl> <ord>     <ord>               <ord>        
 1 0             2022-03    10722. 30-39     HighSchoolOrCollege 3            
 2 0             2022-04     7559. 30-39     HighSchoolOrCollege 3            
 3 0             2022-05     7808. 30-39     HighSchoolOrCollege 3            
 4 0             2022-06     7733. 30-39     HighSchoolOrCollege 3            
 5 0             2022-07     6704. 30-39     HighSchoolOrCollege 3            
 6 0             2022-08     8291. 30-39     HighSchoolOrCollege 3            
 7 0             2022-09     7904. 30-39     HighSchoolOrCollege 3            
 8 0             2022-10     7752. 30-39     HighSchoolOrCollege 3            
 9 0             2022-11     7817. 30-39     HighSchoolOrCollege 3            
10 0             2022-12     7832. 30-39     HighSchoolOrCollege 3            
# ℹ 10,681 more rows
# ℹ 12 more variables: haveKids <lgl>, Income <dbl>, Expenses <dbl>, age <dbl>,
#   interestGroup <chr>, joviality <dbl>, joviality_bins <fct>,
#   Education <dbl>, Food <dbl>, Recreation <dbl>, Shelter <dbl>, Wage <dbl>
kable(head(resident_profile), "simple")
participantId year_mth Cashflow age_group educationLevel householdSize haveKids Income Expenses age interestGroup joviality joviality_bins Education Food Recreation Shelter Wage RentAdjustment
0 2022-03 10722.01 30-39 HighSchoolOrCollege 3 TRUE 11931.95 1209.94 36 H 0 (0,0.2] 38.01 268.26 348.68 554.99 11931.95 0
0 2022-04 7558.67 30-39 HighSchoolOrCollege 3 TRUE 8636.88 1078.21 36 H 0 (0,0.2] 38.01 265.79 219.42 554.99 8636.88 0
0 2022-05 7807.63 30-39 HighSchoolOrCollege 3 TRUE 9048.16 1240.53 36 H 0 (0,0.2] 38.01 264.54 382.99 554.99 9048.16 0
0 2022-06 7732.59 30-39 HighSchoolOrCollege 3 TRUE 9048.16 1315.57 36 H 0 (0,0.2] 38.01 256.90 465.67 554.99 9048.16 0
0 2022-07 6704.27 30-39 HighSchoolOrCollege 3 TRUE 8636.88 1932.61 36 H 0 (0,0.2] 38.01 270.13 1069.48 554.99 8636.88 0
0 2022-08 8290.55 30-39 HighSchoolOrCollege 3 TRUE 9459.44 1168.89 36 H 0 (0,0.2] 38.01 261.76 314.13 554.99 9459.44 0

4. Exploratory Data Analysis (EDA)

Overview of City of Engagement’s resident.

Show the code
#create tooltip to display age group 
participants_new$tooltip <-c(paste0(
  "Age Group:", participants_new$age_group))

#Bar chart for resident's age distribution
p1 <- ggplot(data= participants_new,
      aes(x = age_group)) +
      geom_bar_interactive(aes(tooltip = participants_new$tooltip, stackgroups = TRUE,
                               data_id= age_group)) + 
      scale_fill_manual(values = wes_palette("Chevalier1")) +
      xlab("Age Group") +
      ylab("No.of\nResidents") +
      theme(axis.text.x=element_text(size=5)) +
      theme(axis.title.y=element_text(size=10)) +
      ylim(0,300) 

#Bar chart for resident's household size distribution
p2 <- ggplot(data= participants_new,
      aes(x = householdSize)) +
      geom_bar_interactive(aes(tooltip = participants_new$tooltip, stackgroups = TRUE,
                               data_id= age_group)) +
      scale_fill_manual(values = wes_palette("Chevalier1")) +
      xlab("Size of Household") +
      ylab("No.of Residents") +
      theme(axis.text.x=element_text(size=5)) +
      theme(axis.title.y=element_text(size=10)) +
      ylim(0,400) 


#Bar chart for resident's education level
p3 <- ggplot(data= participants_new,
      aes(x = educationLevel)) +
      geom_bar_interactive(aes(tooltip = participants_new$tooltip, stackgroups = TRUE,
                               data_id= age_group)) +
      xlab("Education Level") +
      ylab("No.of\nResidents") +
      theme(axis.text.x=element_text(size=5)) +
      theme(axis.title.y=element_text(size=10)) +
      ylim(0,600)

#Bar chart to visualize if residents have kids 
p4 <- ggplot(data= participants_new,
      aes(x = haveKids)) +
      geom_bar_interactive(aes(tooltip = participants_new$tooltip, stackgroups = TRUE,
                               data_id= age_group)) +
      ylab("No.of\nResidents") +
      theme(axis.text.x=element_text(size=5)) +
      theme(axis.title.y=element_text(size=10)) +
      ylim(0,850) 

#Bar chart for residents' interest group 
p5 <- ggplot(data= participants_new,
      aes(x = interestGroup)) +
      geom_bar_interactive(aes(tooltip = participants_new$tooltip, stackgroups = TRUE,
                               data_id= age_group)) +
      xlab("Interest Group") +
      ylab("No.of\nResidents") +
      theme(axis.text.x=element_text(size=5)) +
      theme(axis.title.y=element_text(size=10)) +
      ylim(0,150) 

#figure design layout 
design <- "
  132
  132
  132
  554
"

girafe(code = print(p1 + p2 + p3 + p4 + p5  + 
                      plot_layout(design = design, ) + 
                      plot_annotation(title = 
                    "Demographics Insights of residents \n in City of Engagement",
                     theme = theme(plot.title = element_text(size = 20, hjust=0.5))
                      )), 
       width_svg = 12,
       height_svg = 6,
       options = list(
         opts_hover(css = "fill: #02401B;"),
         opts_hover_inv(css = "opacity:0.2;")
         )
       ) 

Insights:

Interactivity

Click on the graph and hover around each demographics. The respective age group will be displayed.

4.1 Chi-square test for Age group and Education Level

ggbarstats(data = resident_profile, 
           x = educationLevel, y = age_group,
           package = "wesanderson", palette = "Chevalier1"
           )

4.2 Differences in Joviality based on Education Level

As defined, Joviality indicates the participant’s overall happiness at the start of the study. We will like to found out if there is a difference in Joviality based on Education Level. Before testing our hypothesis, we will perform a normality assumption test at 95% confidence level.

4.2.1 Normality Assumption Test

At 95% confidence level:

Ho: the mean Joviality across different Education Level is the same

H1: the mean Joviality across different Education Level is not the same

#compute statistics 
average_joviality <- round(mean(resident_profile$joviality),2)
median_joviality <- round(median(resident_profile$joviality),2)
maximum_joviality <- as.numeric(round((IQR(resident_profile$joviality)*1.5) +
                quantile(resident_profile$joviality,0.75)),2)
min_joviality <- as.integer(min(resident_profile$joviality))
IQR_joviality <- round(IQR(resident_profile$joviality)*1.5)
average_joviality
[1] 0.47
median_joviality
[1] 0.45
maximum_joviality
[1] 1
min_joviality
[1] 0
IQR_joviality
[1] 1
ggplot(data = resident_profile ,
  aes(x=joviality, after_stat(count), color = educationLevel)) +
  geom_density(adjust=1, position ="stack") +
  xlim(0,1) +
  geom_vline(aes(xintercept = average_joviality), col ="grey", linewidth=1 ) +
  geom_vline(aes(xintercept = median_joviality), col ="orange", linewidth=1 ) 

ggplot(data=resident_profile,
       aes(sample = joviality)) + 
  stat_qq() +
  stat_qq_line()

  ggplot(data = resident_profile,
         aes(x= joviality)) + 
  geom_histogram(bins=10) +
  xlab("Joviality") +
  ylab("No.of Residents") +
  geom_vline(aes(xintercept = average_joviality), col ="red", linewidth=1 ) +
  annotate("text", x=0.37, y= 1400, label="Average Joviality:", size=4, color = "red") +    annotate("text", x=0.37, y= 1300, label=format(average_joviality, big.mark=","), size=4, color = "red") 

  geom_vline(aes(xintercept = median_joviality), col ="green", linewidth=1 ) 
mapping: xintercept = ~median_joviality 
geom_vline: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity 

Based on the result above, we concluded that there is enough statistical evidence to reject the null hypothesis. Since the p-values fall below (p < 0.05), we will use the Wilcoxon test.

4.2.2 Kruskal-Wallis Test for Joviality across Education Level

We will test the following hypothesis at 95% Confidence Level:

Ho : the median Joviality across different education level is the same

H1: the median Joviality across different education level is not the same

Show the code
ggbetweenstats(data = resident_profile,
       x= educationLevel, y= joviality, type ="np",
       xlab= "Education Level", ylab = "Joviality",
       title = "Comparison of Joviality across Education Level",
       pairwise.comparisons = TRUE, pairwise.display ="ns", conf.level = 0.95,
       package = "wesanderson", palette = "Chevalier1"
       )

As seen above, the P-value is lower than the 0.05. As such, there is enough statistical evidence to reject the null hypothesis that the median joviality across education level is the same.

Additionally, we want to find out if there any distinct similarities between the district. Through the graph above, we discovered that not all pair comparison are statistically significant. The pair (Low and HighSchoolOrCollege) is not statistically significant with a P-value of 0.14, which is greater than 0.05. Thus, we cannot reject the null hypothesis that there is not differences between the joviality level between the pair.

4.3 Association between Age group and Joviality

We discovered that there is a statistical difference in joviality across Education Level, we would like to examine if the same applies and if there is an association between the age group. As Joviality is a continuous variable, we used mutate from dplyr to split it into 5-class variables, with each range at 20%.

At 95% confidence level,

Ho : No association exists between the age group and joviality level

H1: Association exists between the age group and joviality level

ggbarstats(data = resident_profile, 
           x = joviality_bins, y = age_group,
          title = "Comparison of Joviality across age-group",
          pairwise.comparisons = TRUE, pairwise.display ="s", conf.level = 0.95,
          package = "wesanderson", palette = "Chevalier1"
           )

##KIV TO COME BACK AND EDIT. 
p3 <- ggplot(data= resident_profile,
      aes(x = Income)) +
      geom_histogram(bins=30,            
                 color="black",      
                 fill= "darkgrey") +
      xlab("Income") +
      ylab("No.of\nResidents") +
      ylim(0,2100) +
      xlim(-1000,20000) +
      ggtitle("Income Distribution of Residents ") +
      theme(plot.title = element_text(hjust = 0.5))
p3

5. Financial Health Insights

to include graphs of ID financial health over the 2 years.

5.1 Resident’s Financial Health

Show the code
tooltip_css <- "background-color:lightgrey; #<<
font-style:bold; color:black;" #<<

ie <-  ggplot(data=resident_profile) +
     geom_point_interactive (aes(x=Expenses, y= Income, 
                                 tooltip = participantId, data_id = Cashflow)) +
     xlim(0,5000)

girafe(                                  
  ggobj = ie,                             
  width_svg = 6,                         
  height_svg = 6*0.618,
  options = list( #<<
    opts_tooltip(css = tooltip_css), #<<
    opts_hover_inv(css = "opacity:0.2;") #<<
  )                                        
)   
Update

7 rows have been excluded in the graph

5.2 Differences in joviality across income

We want to find out if there are any differences in joviality across income.

The following considerations have been made:

  • assume income increases over the years

  • opacity included to highlight the contrast

  • Tooltip to indicate variables

    • participantID

    • Income

    • Cashflow

We will be using plot_ly for the interactive graph.

plot_ly(data = resident_profile, 
             x = ~joviality, y = ~Income,
        hovertemplate = ~paste("<br>Participant's ID:",participantId,
                               "<br>Cashflow:", Cashflow,
                              "<br>Income:", Income,
                               "<br>Expenses:", Expenses),
             
            type = "scatter",
            mode = "markers",
            marker = list(opacity = 0.7,sizemode = "diameter", 
                          line = list(width =0.1, color = "white"))) |>

#add title and labels to axis 
        layout(title = "Interactive scatterplot of Income vs Joviality" ,
         xaxis = list(title = "Joviality level") ,
         yaxis = list(title = "Income"))

Insights

  • When income exceeds $15k, joviality level decreases

  • it is similar to income range of $10-$15k where majority have low joviality

  • Joviality level are more spread out when income range below $5k

Note

Happiness level of joviality ranges from 0 to 1. A score of 0.4 indicates that participant is dull while a score of 0.8 indicates joy.

DT::datatable(resident_profile, class= "compact")
d <- highlight_key(resident_profile) 
p <- ggplot(d, 
            aes(Expenses, 
                Income)) + 
  geom_point(size =1) 

gg <- highlight(ggplotly(p),        
                "plotly_selected")  

dt <- DT::datatable(d,
                    selection =list(mode="single",
                                    target="column"))
crosstalk::bscols(gg,               
                  dt, 
                  widths = 6)        
fj_cat <- financial_journal_lessdup %>%
  
#recode ID from dbl to chr, year_mth
    mutate(participantId = as.character(participantId),
         year = format(as.Date(financial_journal_lessdup$timestamp), "%Y"),
         mth = format(as.Date(financial_journal_lessdup$timestamp), "%m"),
         amount = abs(round(amount,2)),
         .before = 3) %>%
  
#group the columns in the following order 
  group_by(year, mth, category) %>%
  summarize(total_amount = sum(amount))

fj_cat
# A tibble: 62 × 4
# Groups:   year, mth [12]
   year  mth   category       total_amount
   <chr> <chr> <chr>                 <dbl>
 1 2022  03    Education            14354.
 2 2022  03    Food                327829.
 3 2022  03    Recreation          649580.
 4 2022  03    RentAdjustment       53504.
 5 2022  03    Shelter             631623.
 6 2022  03    Wage               6402720.
 7 2022  04    Education            11424.
 8 2022  04    Food                304282.
 9 2022  04    Recreation          389688.
10 2022  04    RentAdjustment        1429.
# ℹ 52 more rows
ggplot(data = fj_cat,
       aes(x= category, y=total_amount, size = total_amount)) + 
         geom_point(alpha=0.8, show.legend= FALSE) +
         scale_size(range =c(2,12)) +
         labs(titles = "Year{as.integer(frame_time)}",
              x = "Category",
              y = "Total_amount") +
         transition_time(as.integer(year)) +
         ease_aes("linear") +
         scale_y_continuous(limit =c(0,2000000))

Average Cashflow available for Residents across Age Group

Show the code
tooltip_css <- "background-color:#C7B19C; #<<
font-style:bold; color:#446455;" #<<

tooltip <- function(y, ymax, accuracy = 1) {   #<<
  mean <- scales::number(y, accuracy = accuracy) #<<
  sem <- scales::number(ymax - y, accuracy = accuracy) #<<
  paste("Average Cashflow:", mean, "+/-", sem) #<<
} #<<

pp <- ggplot(data=resident_profile, 
                   aes(x = age_group),
) +
  stat_summary(aes(y = Cashflow, 
                   tooltip = after_stat(  #<<
                     tooltip(y, ymax))),  #<<
    fun.data = "mean_se", 
    geom = GeomInteractiveCol,  #<<
    fill = "#D3DDDC"
  ) +
  stat_summary(aes(y = Cashflow),
    fun.data = mean_se,
    geom = "errorbar", width = 0.2, linewidth = 0.2
  ) +
  labs(title="Average Cashflow available to Residents by age_group") +
  ylab("Total Cashflow") +
  xlab("Age Group") 

girafe(ggobj = pp,
       width_svg = 8,
       height_svg = 8*0.618,
       options = list(    #<<
       opts_tooltip(    #<<
       css = tooltip_css)) #<<
       )